Random variable, stochastic process and simulation.

import numpy as np
import scipy 

from numpy.random import normal, choice, uniform

import ipywidgets as widgets
import matplotlib.pyplot as plt
plt.style.use('fast')

%matplotlib inline

Random numbers, probability distribution and stat analysis

  • The numpy.random has the fastest random number generators that are based on low level code written in C.

  • The Scipy.stats has an extensive library of statistical distributions and tools for statistical analysis.

  • The Statsmodels Enhancing Scipy functionality with more tools

  • The Seaborn library that enhances matplotlib functionality for stat visualization.

General overview of random numbers in python

First we take a look at most widely used random numbers of numpy also called standard random numbers. These are rand (uniform random number on interval 0,1) and randn (stnadard normal random number with 0 mean and 1 variance).

  • When running code that uses random numbers results will always be different for every run. If you want code to reproduce same result you can fix the seed to get reproducible results: np.random.seed(8376743)

\[\begin{split}f(x)=\begin{cases} {\frac {1}{b-a}}&\mathrm {for} \ a\leq x\leq b,\\[8pt]0&\mathrm {for} \ x<a\ \mathrm {or} \ x>b \end{cases} \end{split}\]
# Generates standard uniform random numbers U(0,1)
r = np.random.rand(10) 

r
array([0.75837541, 0.70300344, 0.39583868, 0.01087556, 0.44637723,
       0.2457634 , 0.00973097, 0.54637   , 0.4948939 , 0.95468676])
def rnplot(r):
    '''Convenience function for making quick two panel plot showing 
    a line plot for the sequece of random numbers (RN)
    a histogram plot of probability density of random numbers 
    '''
    
    fig, ax = plt.subplots(ncols=2) 

    ax[0].plot(r,  color='blue', label='trajectory')
    ax[1].hist(r,  density=True, color='red',  label = 'histogram')
    
    
    ax[0].set_xlabel('Samples of RN')
    ax[0].set_ylabel('Values of RN')
    
    ax[1].set_xlabel('Values of RN')
    ax[1].set_ylabel('Probability Density')

    fig.legend();
    fig.tight_layout()
r = np.random.rand(200) 
rnplot(r)
../_images/02_RandVar_8_0.png

In the same way we generate and visualize norally distributed random numbers, \(N(0,1)\)

\[P(x |\mu=0, \sigma=1) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
r = np.random.randn(200) 

rnplot(r)
../_images/02_RandVar_10_0.png
  • We showed examples of standard random numbers \(U(0,1)\) and \(N(0,1)\)

  • Parametraized random number allow you to set parameters like value of mean and interval length and therefore can be viewed as generalized versions of standard random numbers. Below we take a look at examples of continuous (random and unofrm RVs) and discrete random numbers generated by paratmerized distributions.

r = np.random.uniform(low=-1, high=1, size=(5, 200))
print(r.shape)
rnplot(r[0,:])
(5, 200)
../_images/02_RandVar_12_1.png
r=np.random.normal(loc=8, scale=10, size=(2, 2000))
print(r.shape)
rnplot(r[0,:])
(2, 2000)
../_images/02_RandVar_13_1.png
r = np.random.binomial(n=10, p=0.6, size=2000) 

print(r.shape)
rnplot(r)
(2000,)
../_images/02_RandVar_14_1.png

Using random numbers to get answers via simulations

One of the major uses of random numbers is for conducting numerical simulations. What is a simulation? It is a recreation of a process on a computer. And this recreation is done by random numbers. E.g to simulate coint tosses, die throws, diffusion of molecules, conformational change of polymers we use random number to recreate the process on a computer. Let’s start off by asking some simple questions

  • How often do we get a run of 5 or more consecutive heads in 100 coin tosses if we repeat the experiment 1000 times?

  • What if the coin is biased to generate heads only 40% of the time?

We will use np.random.choice which generates random samples from limited options.

np.random.choice(a, size, p, replace=True)
  • a: an array/list of choices like, [-1, 1] or [1, 2, 3] or ['bagel', 'muffin']

  • p: probabilities of picking choices from a. E.g p=[0.5, 0.5]. Default is equal probabilities.

  • size: shape of an array could be N, (N,M), (N,M,Q) etc

  • replace=True. If you choose bagel from a, the caffee will put back a new one on shelf.

L = 100    # length of each trajectory
N = 1      # number of experiments: stochastic trajecotries generated

xs = np.random.choice([0,1], size=(L, N)) # (i) Unbiased coin p=[0.5,0.5] by default

ys = np.random.choice([0,1], size=(L, N), p=[0.9, 0.1])  # (ii) biased coin
fig, ax = plt.subplots(nrows=2)

r = np.random.randn(200) 

ax[0].plot(xs, 'o', alpha=0.5, label  =  'data1')
ax[0].plot(ys, 'o', alpha=0.5, label =  'data2')

ax[1].hist((xs[:,0], ys[:,0]), 2, label = ("data1", "data2"))

fig.legend();
../_images/02_RandVar_19_0.png

Simulating a 1D unbiased random walk

In the course of simulating random walks we will be genreating multidimensional numpy arrays. We will adhere to a convnetion that:

  • Rows are regarded as number of measurments, or samples

  • Columns are regarded as number of observables distinct measruments/trajectories

  • We then take cumulative sum over trajectory [a,b,c,…], which accumulates random walker’s position over time [a, a+b, a+b+c,…]. This is done by convenient np.cumsum() method.

def rw_1d(L, N):
    '''
    L: trajectory length
    N: Number of trajecotry
    returns np.array (L, N) shape 
    '''
    
    # Create random walks 
    r  = choice([-1,1], size=(L, N))
    
    #Accumulate position
    rw = r.cumsum(axis=0)

    #Set initial position 
    rw[0,:]=0 
    
    return rw
rw = rw_1d(2000, 1000)

print(rw.shape)
L = rw.shape[0]
N = rw.shape[1]
(2000, 1000)
# Simulate 1D random walk
Tmax = 1000
N    = 1000 
rw   = rw_1d(Tmax, N)

@widgets.interact(t=(1, Tmax-1))
def rw_plotter(t=1):
    
    fig, ax = plt.subplots(nrows=2)

    ax[0].plot(rw)
    ax[0].axvline(x=t, color='black', linestyle='-', lw=3)
    ax[1].hist(rw[t, :], density=True)

    ## Plot gaussian with width t**0.5
    x = np.linspace(-100,100, 1000)
    y = scipy.stats.norm.pdf(x, 0, np.sqrt(t))
    ax[1].plot(x,y)  

    ax[0].set_ylabel('Position')
    ax[0].set_title('RW trajectries');

    ax[1].set_xlabel('Position')
    ax[1].set_ylabel('Histogram')
    ax[1].set_xlim([-100, 100])
    
    fig.tight_layout()
../_images/02_RandVar_23_1.png

Connection with diffusion: Mean square displacement

\[ \langle X(t)^2 \rangle^{1/2} = \frac{1}{n}\sum_{trajectories} X_n(t)^2 \sim t^{1/2} \]
L, N = 2000, 1000
rw = rw_1d(L, N)

t = np.arange(L)
msd = np.mean(rw**2, axis=1)
rsd = msd**0.5

plt.loglog(np.arange(L), rsd, lw=3) 

plt.loglog(t, np.sqrt(t), '--')

plt.title('Compute mean square deviation of 1D random walker',fontsize=15)
plt.xlabel('Number of steps, n',fontsize=15)
plt.ylabel(r'$MSD(n)=\langle x(n)^2 \rangle^{1/2}$',fontsize=15);
../_images/02_RandVar_26_0.png

Questions

  • Compute MSD for 1D random walk show the expected scaling

  • Generate random walks from different positions

  • choice between [-1,1]

  • Multinomial choice

  • Self avoiding walks

def rw_2d(L, N):
    
    '''2d random walk function:
    L: trajectory length
    N: Number of trajecotry
    returns np.array with shape (L, N)
    '''
    verteces = np.array([(1,  0),
                         (0,  1),
                         (-1, 0),
                         (0, -1)])
    
    rw       = verteces[choice([0,1,2,3], size=(L, N))]
    
    rw[0, :, :] = 0
    
    return rw.cumsum(axis=0)
traj = rw_2d(L=10000, N=1000)
print(traj.shape)
(10000, 1000, 2)

Compute room mean square distance of a random walker

\[ \langle R(t)^2 \rangle^{1/2} = \langle (X(t)^2+Y(t)^2)\rangle^{1/2} \sim t^{1/2} \]
fig, ax  = plt.subplots(nrows=2, figsize=(10,10))

#Simulate 2D random walk
L, N = 10000, 1000
traj = rw_2d(L, N)

#Compute RSD 
t            = np.arange(L)
r2           = np.sum(traj**2, axis=2)
mean_r2      = np.mean(r2, axis = 1)
rsd          = np.sqrt(mean_r2)

ax[0].plot(traj[:3000, :5, 0], traj[:3000, :5, 1]);

ax[1].loglog(t, rsd, lw=3, alpha=0.5);
ax[1].loglog(t, t**0.5, '--');

ax[0].set_title('2D random walker',fontsize=15)
ax[0].set_xlabel('X')
ax[1].set_xlabel('Y')

ax[1].set_xlabel('Number of steps, t',fontsize=15)
ax[1].set_ylabel(r'$MSD(n)=\langle R(t)^2 \rangle^{1/2}$',fontsize=15);
../_images/02_RandVar_32_0.png

Binomial distribution of a random walker

  • In previous example we started with random variables with no reference to probability distribution. This time we will generate random numbers from a binomial distribution.

  • This time we will use scipy.stats library which contains probability distirbution functions. That is in addition to generating random variables we can also compute probability distributions and related quantities analytically.

from scipy.stats import binom, norm, poisson  
s =  binom(10, 0.5) # Let us declare s to be a binomial RV
print(s.rvs(20))          # 20 random samples form X
print(s.pmf(5))           # P(X = 5)
print(s.cdf(5))           # P(X <= 5)
print(s.mean())           # E[X], mean
print(s.var())            # Var(X), variance
print(s.std())            # Std(X), standard deviation
[5 8 3 4 5 2 8 4 3 3 4 7 5 7 6 5 4 3 5 4]
0.24609375000000003
0.623046875
5.0
2.5
1.5811388300841898
def coin_flip(L,p,N):
    
    '''
    L: flip coint L times 
    q: with p probability
    N: Repeat experiment N times
    ''' 
    coin       = binom(L, p)   # Binomial RV
    
    coin_flips = coin.rvs(N)   # Generate sample of N points
    coin_pmf   = coin.pmf(np.arange(L+1)) # Generate PMF from 0 to L 
    coin_cdf   = coin.cdf(np.arange(L+1))
    
    return coin_flips, coin_pmf, coin_cdf
fig, ax=plt.subplots(nrows=1, ncols=3,figsize=(12,3))

#Coin flip 1 time.
X1, P1, CP1 = coin_flip(20, 0.5, 80)

ax[0].plot(X1,'-',color='grey')

ax[1].hist(X1,density=True)
ax[1].plot(P1,'-o')

ax[2].plot(CP1,'-d',color='red')
[<matplotlib.lines.Line2D at 0x7f7420cad790>]
../_images/02_RandVar_38_1.png

Continuous time random walk: Brownian motion

We have learned about random variables, random walk and have encorunetred a concept of stochastic process on the example of discrete step 1D random walk. Now let us generate the prototypical stochasic process in continuous time; the brownian motion. Brownian motion was first discoverd by a botanist Brown who noticed that a pollen in solution undergo erratic and incessant motion. To simulate brownian motion we take the continuous time limit of random walk and approximate dsiplacements of our particle as normally distributed (binomial->normal, time step->continuous time)

\[x(t+dt)-x(t)=N(0,\sqrt{2D dt})\]

We assume we have started at position \(\mu=0\) and our variance is given by \(\sigma^2=2Dt\) Where D is the diffusion coefficnets which is related to parameters of discree random walk as shown in the lecture.

\[x(t+dt)=x(t)+\sqrt{2D dt} \cdot N(0,1)\]

In the last step we re-wrote brownian motion equation in a convenient way by shifting normally distributed radnom variable by \(\mu\) and scaling it by \(\sigma\)

\[N(\mu, \sigma^2) = \mu + \sigma N(0,1) \]
def brown(T, N, dt=1, D=1):
    
    """
    Creates 3D brownian path given:
    time T 
    N=1 trajecotires
    dt=1 timestep
    D=1 diffusion coeff
    returns np.array with shape (N, T, 3)
    """
    
    nT = int(T/dt) # how many points to sample
    
    dR = np.sqrt(2*D*dt) * np.random.randn(N, nT, 3) # 3D position of brownian particle
    
    R = np.cumsum(dR, axis=1) # accumulated 3D position of brownian particle
    
    return R

Brownian motion

  • Below we proceed to simulate continuous random walk in 1D-3D

  • We will plot trajecotires and distributions of brownian particle using interactive plotting via ipywidgets and holoviews/plotly interface.

R=brown(T=3000, N=1000)
print(R.shape)
(1000, 3000, 3)
@widgets.interact(t=(10,3000-1))
def brownian_plot(t=10):
    
    fig, ax = plt.subplots(ncols=2)
    
    ax[0].plot(R[:5, :t, 0].T, R[:5, :t, 1].T);
    
    ax[1].hist(R[:, 10, 0], density=True, color='red')
    ax[1].hist(R[:, t, 0], density=True)
    
    ax[1].set_ylim([0,0.1])
    
    ax[0].set_ylim([-200, 200])
    ax[0].set_xlim([-200, 200])
    
    fig.tight_layout()
import holoviews as hv
hv.extension('plotly')

plots = []
for i in range(10):
    
    plot = hv.Path3D(R[i,:,:], label='3D random walk').opts(width=600, height=600, line_width=5)
    plots.append(plot)
    
hv.Overlay(plots) 
rw_curve = [hv.Curve((R[i,:,0], R[i,:,1])) for i in range(10)]

xdist = hv.Distribution(R[:,10,0], ['X'], ['P(X)'])
ydist = hv.Distribution(R[:,10,1], ['Y'], ['P(Y)'])

hv.Overlay(rw_curve) << ydist << xdist

Diffusion Equation

The movement of individual random walkers \(\leftrightarrow\) density of walkers \(\rho(\vec{r},t)\)

Diffusion equation:

Formulated empirically as Fick’s laws

\[\frac{\partial\rho}{\partial t} = \mathcal{D}\nabla^2\rho\]
  • This is a 2nd order PDE! Unlike equations of motion diff eq shows irreersibile behaviour

  • This one exactly solvable. But in general reaction-diffusion PDEs difficult to solve analytically.

  • Can solve numerically by writing derivatives as finite differences!

  • Can also simulate via random walk!

  • Diffusion coefficient \(D\), Units \([L^2]/[T]\)

Important special case solution (here written in 1d):

\[\rho(x,t) = \frac{1}{\sqrt{2\pi \sigma(t)^2}}\exp\left(-\frac{x^2}{2\sigma(t)^2}\right),\]

where \(\sigma(t)=\sqrt{2{D}t}\)

  • density remains Gaussian

  • Gaussian becomes wider with time

  • check that this is indeed a solution by plugging into the diffusion equation!

def sigma(t, D = 1):
    return np.sqrt(2*D*t)

def gaussian(x, t):
    return  1/np.sqrt(2*np.pi*sigma(t)**2) * np.exp(-x**2/(2*sigma(t)**2)) #
@widgets.interact(t=(1,100,1))
def diffusion(t=0.001):
    
    R = brown(T=101, N=1000)
    x = np.linspace(-20, 20, 100)
    
    plt.plot(x, gaussian(x, 1), '--', color='orange', label='t=0')
    
    plt.plot(x, gaussian(x, t), lw=3, color='green', label=f't={t}')
    
    plt.hist(R[:,t,0], density=True, alpha=0.6, label='simulation hist')
    
    plt.legend()
    plt.ylabel('$p(x)$')
    plt.xlabel('$x$')
    plt.xlim([-25, 25])